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Abstract-Dynamics and granularity in data mining grid scheduling 
(DMGS) are greatly influenced by resources. DMGS is defined as 
a workflow, combined with colored-hierarchy Petri Nets (CHPN) 
structure of the scheduling variable scheduling algorithm, which 
contains the scheduling problem into job scheduling layer, sub-job 
scheduling layer, task scheduling layer and sub-task scheduling 
layer. Within the four layers, job scheduling will be decomposed 
from top to bottom. Each layer is based on Petri net state transfer. 
The results show that HCPN can be effective in DMGS. 
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I. Introduction 

Data mining grid scheduling (DMGS) is a dynamic 
scheduling process with the granularity of jobs may fluctuate 
due to resources [1]. Therefore, they need to be redistributed. A 
job usually includes several related operations or tasks. DMGS 
is a complex process [2]. A clear and intuitive scheduling 
model is necessary to define the process, analyse and evaluate 
their performance. 

DMGS defines a job as a workflow and is able to integrate 
with other algorithms. Thus, it has been widely studied and 
applied [3]. Scheduling model based on Petri nets can be easily 
adjusted according to the resource allocation and real-time to 
support the dynamic merge sub-tasks, split, update, and 
modification [4]. 

Petri Net is one of several mathematical modelling 
languages for the description of distributed systems, which is a 
directed bipartite graph [5-8]. A Petri Net includes of places, 
transitions and directed arcs. Different elements have the 
specific meanings [9]. For example, the nodes represent 
transitions and places, the directed arcs imply which places are 
pre- and /or post conditions for which transitions. 

Arcs in Petri Net run from a place to a transition or vice 
versa, never between places or between transitions [10]. Two 
types of places are here. One is input places that an arc runs to 
a transition from the places. The other is output places that arcs 
run from a transition to the places. Places often contain a 
natural number of tokens, of which are over the places [11, 12]. 

Execution of Petri Net is nondeterministic [5, 13, 14]. That 
means multiple transitions are enabled at the same time, which 
would be fire. If a transition is enabled, it might be fire. 
Multiple tokens may be present in the net at anywhere and 
anytime [12]. Petri nets are well suited for modelling the 
concurrent activities of distributed systems such as planning 
and scheduling system, process management system, 
warehouse distributing system, etc [5, 6, 10, 12, 13, 15]. 



This paper presents a DMGS algorithm which is based on 
colored-hierarchy Petri net (CHPN) to monitor the job 
assignment and scheduling process accurately as well as to 
evaluate and analyse the performance. This algorithm 
decomposes the scheduling processes from top to bottom. The 
top level describes the process of the scheduling, while the low 
level focuses on the scheduling granularity. The purpose of the 
two-hierarchy scheduling approach is to reduce the scheduling 
time and algorithm complexity [16, 17, 18, 19]. This algorithm 
possesses some advantages such as high coupling, good 
performance and effective through using the CHPN approach 

II. Colored-hierarchy Petri Nets (CHPN) 

CHPN scheduling defines the tasks planning and 
scheduling network through top to down decomposition [20, 
21]. Its purpose is to reduce the complexity and adaptive to 
change its structure according to the granularity so as to 
support the dynamic scheduling [10]. There are some 
definitions as follows: 

Definition 1 : CHPN is defined as a set with eight elements: 

Grid={P,F,D,C,I,0,K,M} 

5 

where, 

P = {P,T) 
" keeps the jobs with different statuses and granularities 



[12]. 



T = {t s ^t c } 



is a set of statuses. 5 represents the status transfer. c 

means the complex status transfer, t 1 i s the set of arc which 
determines the input and output [22]. 

L) describes the set of different colors that to differentiate 
the jobs such as jobs, sub-jobs, tasks and sub-tasks [23]. 



C 



denotes the color function. 



C:PuT->0(D) 



*D) * 



LO 



is the coefficient set of colors. ' denote the 
functions of status transfer input and output arc respectively. 

(p',t')e(PxT) 
have to meet 

I(p\t<)e[C(pX s ^C(t<) ms ] L 
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and 

is based on 



means. 



I(p',f) = 



Table I Explanation of d ■ 



K- denotes the capacity that determines the number of jobs 

parallel processing in the scheduling network [24]. M means 
the initiative status. 

M:p^D ms 

Definition 2: ( - rri " will transferred from one status to 
another one is based on : 

Vp' e't-.Mip^Oip'^AVp' ef :M(p , )+/(p , ,t')<X(p') 
V scan t fend means the start and end status. 

> k =0Apf=0Ap'^p' 

Definition 3: Any complex status in the *- rri " could be 
extended as a sub-network: 



where, 



meets 



d, 


Explanations 


di 


Jobs submitted from the users. 

C(p 1 ) = {d 1 } 0(0 = ^} 


d 2 


Jobs waiting for processing. 

C(p 2 ) = {d 2 } C(t 2 ) = {d 2 ,d g } 


d 3 


Jobs are under processing. 

C(p 3 ) = {d 3 } C(t 3 ) = {d 3 } 


d 4 


Jobs are waiting for check. 

C(p 4 ) = {d 4 } C(t 4 ) = {d 4 } 


d 5 


Jobs are correct. 

C(p 5 ) = {d 5 ,d 6 ,d 7 } C(t 5 ) = {d 5 } 


d 6 


Jobs are completed. 

C(p 5 ) = {d 5 ,d 6 ,d 7 } C(t 6 ) = {d 6 } 


d 7 


Jobs are failed. 

C(p 5 ) = {d 5 ,d 6 ,d 1 } C(t 7 ) = {d 7 } 


d, 


Jobs are returned to the users. v *7/ I 8 J 


d 9 


Jobs TOKEN with limited numbers that can be run 
parallel. C( ^ ) = ^} 



The jobs transfer in this layer is as figure 1. 



Pf,(ijt- 



Pi Pi | Pi Pt, Ph 



U 



U h 




S_Grid i ={P i ,F i ,D i ,C i ,I i ,O i ,K i ,M l } 
{t, k 1 1 < k < n} 

uiApLn^-uiAP.A) 

{ Pk \l<k<u}=® ti 

III. DMGS BASED ON CHPN 

DMGS algorithm based on CHPN can be divided into four 

layers [25, 26, 27]. They are job scheduling layer, sub-job 

scheduling layer, task scheduling layer and sub-task scheduling AU paragraphs must be indented . All paragraphs must be 

layer. They will be detailed demonstrated in the following j ust ified, i.e. both left-justified and right-justified [15, 21, 25]. 

sections. 

Sub-job scheduling comes from the complex transfer from 

A. Job scheduling layer me j ^ scheduling layer. It mainly includes job analysis, job 

This layer mainly executes the execution of jobs. The jobs generation and monitoring. Sub-job scheduling network is 

statuses are running, waiting, completed and failed. defined as: 

S_Grid i ={P i ,F i ,D„C i ,I i ,O i ,K i ,M 1 } 
And 

; = 4 

which means 



Fig. 1 . Jobs Status Transfer in Job Scheduling Layer 

B. Sub-job Scheduling Layer 



Let: 



where, 



Grid = {P,F,D,C,I,0,K,M} 



P = {p, 1 1 < z < 5},T = {t, 1 1 < z < 5} 

D = {d, 1 1 < i < 9} 



p4={p s i t ,p e 4 nd , P :\\<i<\o} 

T4 = {f, 4 11< z <15} 
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4 = {C(pl rt )uC(pl d )ud 4 } 



Table 2 illustrates the indices used in this layer. 

Table II Indices Explanations 



d* 


Explanations 


d? 


Jobs are initiated. 

c(pi rt )={d 4 }c(p: nd )={d 5 ,d 6 ,d 7 }^ 

C( fl 4 ) = {d 4 } C(t 3 4 ) = {d 1 4 } 


d 4 2 


Jobs are not initiated. 

C(p 1 4 ) = {d 1 4 ,d 4 } C(t 4 ) = {d 4 } 


dl 


Sub-jobs C(p 2 4 ) = {d 4 }C(t 4 4 ) = {d 4 } 


< 


Sub-jobs are under running. 

c(p 4 ) = {d 4 4 } c(t 5 4 ) = {d 4 4 ,d 4 } 


dt 


Sub-jobs are waiting for check. 

C(p 4 4 ) = {d 4 } C(t 6 4 ) = {d 4 } 


dt 


Sub-jobs are correct. 

C(p 5 4 ) = {d 6 4 ,d 7 4 ,d 8 4 }C(t 8 4 ) = {d 6 4 } 


< 


Sub-jobs are completed. 

c(p 4 ) = {d 6 \d 7 4 ,d 8 4 } 5 C(f 9 4 ) = K} 


< 


Sub-jobs are failed. 

C{pt) = {dt,dldt}C{t*) = {dt} 


dt 


Sub-jobs are labelled. ^ 6 ' ~ *■ 9 ' 


< 


Sub-jobs are checked. 

C(p 4 ) = {d 1 4 ,d 4 }C(t 4 ) = {d 1 4 } 


< 


Sub-jobs are not checked. 

C(p 4 ) = {d 1 4 ,d 4 }C(t 4 ) = {d 4 } 


d n 


Sub-jobs TOKEN, 

C(p 4 ) = {d 4 }C(t 4 ) = {d 4 4 ,d 1 4 2 } 


<3 


Sub-jobs clean TOKEN. ^ 9 ' ~ *■ 13 ' , 
C ( f l 4 2) = { d 4' d l 4 3} 


< 


Sub-jobs have been done. 

C(p 1 4 ) = {d 1 4 4 ,d 1 4 } ) C(t 4 ) = {d 1 4 4 } 


< 


Sub-jobs are under running with tasks. 

c( P : )={d: 4 ,d: 5 }c( t : 5 )={d: 5 } 



C. Task Scheduling Layer 

Task scheduling layer is the sub layer of the sub-job 
scheduling. It is extended by the complex status transferring 

t! 
1 , which contains the analysis of tasks and their monitoring. 

Let 

i = 6,j = 4 

^4.6= {P^.P^.P, 4 ' 6 |1</<10> 

.4,6 



r 4 , 6 = {^|i</<i5} 



4,6 

A 6 = {C(pr rt )uC(p^)ud-! 



,4,6 
start 

(4,6 ( j4 



{d 4 ' 5 |l<z<15} 

The status transferring logics and their relationships are 
illustrated in Fig 2. 



,4,6 



« 



C 4 '* „ 46 i 



w 4 *i ruj'.'i uWin;.j J .4i ,j«\>iu 4 .'i mWiiij" jW jWi Jr5 
/ ^": o"a Irj4 jllM JrW lift I / Si's HR ,°7 >"« JA 




W 



.4,6 



_^>, A ft'r>iffl 

4,6 {<\ ' 




f 4 ' 6 WX, .Tt 






>0 



Fig. 2. Status Transferring and Relationships in Tasks Scheduling Layer 

D. Sub-task Scheduling Layer 

Sub-task scheduling layer is extended from the task 

scheduling layer according to ' . Its mainly purpose is to 
analysis the tasks, arrange and generate sub-tasks as well as 
monitor their statuses and re-allocation [24]. 



Set 



i = 6, ;' = 5 

-4,6,6 I V start ' Fend ' fi | 1—1 — it/ 

,.4,4,6 



4,4,6 



{t 4 ' 4 ' 5 |l<z<21} 



D 



( ,,6={c(pr)^c( P 4 n r)ud 4 ^} 

d 4 ' 4 ' 6 = {d 4A6 |l<z<15} 



Table 3 reports the explanation of different parameters. 

Table III Explanations of Parameters in Sub-task Scheduling Layer 
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rf 4,6,6 


Explanations 


d 4.6,6 


Tasks are not initiated. 
/-V 4,6,6\ f j4,6,6 j4,6,6i a->/j.4,6,6\ f j4,6,6i 

c(a ) = { rf r ' d 2 } c ( f 3 ) = {di } 


d*/* 


Tasks have been initiated. 

c( Pl 4 - 6 - 6 ) = {d;' 6 ' 6 ,^ 4 - 6 - 6 } c(t 4 ' 6 - 6 ) = {d 4 ' 6 ' 6 } 


d 4,6,6 
u 3 


Sub-tasks are under running. 
/-i/ 4,6,6 \ f j4,6,6i /^->/v4,6,6\ r j4,6,6i 

C(P 2 ) = {d 3 " } C(t 4 " ) = {d 3 " } 


j4,6,6 

d 4 " 


Query result of resources. 

c(p^ b ) = {d 4 4 - 6 ' 6 } c(0 = {d 4 4 ' 6 ' 6 ,d 4A6 } 


j4,6,6 
d 5 


Select results of resources. 

C(p 4 4 < 6 ' 6 ) = {d 4A6 } C(t 6 4 < 6 < 6 ) = {d 4 ' 6 ' 6 } 


rf 4,6,6 


Null resources. 

/^-y„4,6,6\ fj4,6,6 j4,6,6 j4,6,6i /-i/v4,6,6\ f J 4,6,6 i 
QP 5 )=W »<¥ »<% } C ( f 8 ) = K } 


j4,6,6 

d 7 " 


Sub-task layer. 

AY„4,6,6\ fj4,6,6 j4,6,6 j4,6,6i ^-v + 4,6,6 \ f J 4,6,6 1 

QP 5 ) = W '"7 »4 } C (V ) = 1«7 } 


d 4,6,6 
u 8 


Selection label of sub-task. 

/-i/ 4,6,6\ f j4,6,6 j4,6,6 j4,6,6i 

C(p 5 ) = {d 6 " ,d 7 ",d 8 "} 
C(t 7 " ) = {d 8 " } 


rf 4,6,6 


C(p 4,6 ' 6 ) = {d 4 ' 6,6 } 

Sub-tasks are for scheduling. 6 9 


j4,6,6 


Sub-tasks are under running. 

/"V „4,6,6\ ( j4,6,6 j4,6,6i /^->/v4,6,6 \ f j4,6,6i 

C(P 7 ) = {d 10 ,d n ' } C(t!3' ) = {d 10 } 


^4,6,6 


Sub-tasks are completed. 

/"V „4,6,6\ fj4,6,6 j4,6,6> /^-"/j-4,6,6 \ ( j4,6,6i 

C(P 7 ) = {d 10 ,d n } C(t u ' ) = {(/„• } 


j4,6,6 


Sub-tasks are uncommon. 

/'"V „4,6,6\ (j4,6,6i /^"/'j-4,6,6\ f j4,6,6 j4,6,6i 

C(p 8 ' ) = {d 12 } C(t s " ) = {d 4 ,d 12 } 


rf 4,6,6 
u 13 


Sub-tasks are not labeled. 

C(p 4 < 6 ' 6 ) = {d 4A6 } C(C 6 ) = {d^ 6 ,d^ 6 } 


j4,6,6 

d 14 " 


Sub-tasks have not scheduled yet. 

c(p? 6 ' 6 ) = {d^C' 6 } 
c(C 6 )-{C' 6 } 


j4,6,6 
rf 15 


Sub-tasks have scheduled. 

c(Ar)={C.o 
c(t 1 r)={d 1 r} 


rf 4,6,6 
u 16 


All sub-tasks are completed. 

c(pr)={CVn' 6 ' 6 } 

/^/*.4,6,6\ f j 4,6,6 i 
C ( f 20 ) = i d l6 } 


d 4,6,6 
u 17 


Tasks with some sub-tasks uncompleted. 

/-,( 4,6, 6\ f j4,6,6 j4,6,6i a-Vj.4,6,6 \ f j4,6,6i 

C(A4 ) = {d 16 ,d 17 } C(tuJ ) = {d 17 } 


j4,6,6 
rf 18 


Sub-tasks are reallocated successfully. 



d 4,6,6 


Explanations 




C(A^) = {d 1 V' 6 ,di A6 } 

c(0 = {<8 6 ' 6 } 


j4,6,6 

d 19 " 


Sub-tasks are reallocated unsuccessfully. 

c(pr)={C' 6 ,< A6 } 

C(C 6 ) = W} 


j4,6,6 


Sub-tasks run after reallocation. 

/"V „4,6,6\ fj4,6,6i /^>/-j.4,6,6\ (j4,6,6i 
C(Pl3 ) = ( d 20 1 C ( f 15 ) = { rf 20 1 


d 4,6,6 

u 21 


Sub-tasks clean label. 



The four-layer CHPN scheduling algorithm describes 
specific implementation processes that the user submits a job. 
It is based on workflow-based approach to relate to the steps 
and the specific properties of interrelated management, which 
aims to decompose the scheduling complexity into different 
levels. Therefore, different levels of granularity are managed 
by the corresponding layer and the process is based on variable 
structure, in which a task level can be extended [16]. The sub- 
task statuses transfer is shown in the following Fig. 3. 




WTK i 



( 4.M=J= ^pt 
Pi V Ui V p s 









,4,6,6 
IB 










4,6,6 



Fig. 3. Status Transfer in Sub-task Scheduling Layer 

IV. Simulation and Analysis 

The experiments are carried out through a DMGS network 
which is based on the reachable tree model. The initiatives of 
this model is 

M = {1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0} 

while the finish status is 

M e = {0,0, 0,0, 0,0, 0,0, 0,0, 0,1} 

The number of jobs is 16 in the simulation. The sub-jobs is 
15 with 16 input tasks. Each task contains 21 sub-tasks. The 
objective function is defined as: 
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maxCr^ 1 ),...^ 16 )) 




4 6 8 10 12 14 18 

(a) Status and Time Relation 




6 8 10 12 14 

(b) Algorithm Convergnece 
Fig. 4. Experiment Results 
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The experiment environment is based on a computer with 
processor: Intel (R) Core (TM)2 CPU 6600@2.40GHz and 
RAM: 4.00GB (3.49 GB usable). System type is 32-bit 
Operating System of Windows 7 Enterprise. The configuration 
of the experiment is under Matlab 2008 through the Petri Net 
Toolbox for Matlab which is for simulation, analysis and 
design of discrete event systems, based on the Petri Net model 
proposed in this paper. All the initiations are carried out in this 
toolbox so that the experiments results could be achieved and 
detailedly discussed as follows. 

The reachable tree model has a critical path that can 
generate the optimal results [2]. Matlab simulation is used in 
this experiment to carry out the CHPN scheduling. Results are 
achieved as the following figures. 

Fig. 4 (a) illustrates the comparison of status transferring 
and the cost of time. The time cost of the algorithm proposed in 
this paper is [1.8S, 4.75S]. (b) shows the convergence of the 
algorithm. It is obvious that the algorithm solution space is 
bounded with the problem size increases, further enhance 
convergence will be achieved. 

The CHPN scheduling algorithm features large number of 
calculation or transfer, long-time running performance [17]. 
After the top to down hierarchy operations, parallel processes 
are achieved through reachable principles which means from 
the very beginning status, there are some statuses can be 
achieved during the scheduling processes [18]. The experiment 
uses 16 statuses with the TOKEN subjects to 



£c(^)=i 



Fig. 5 shows the deviation which marked with x, which 
indicates the deference of O. 

The deviation is 



2>| =3.4 

While the deviation ratio is 3.125%. Table 4 shows the time 
cost and deviation achieved from the experiments. 

Table Iv Time Cost and Deviation 



Layer Amount 



Time 
Cost (s) 



Transfer Status Deviation Ratio 






170 


32 


5.14% 


1 


129 


43 


5.01% 


2 


94 


79 


3.78% 


4 


77 


106 


3.125% 



Fig. 5. Deviation Analysis 



V. Conclusion 

Data mining grid scheduling (DMGS) is a dynamic 
scheduling with characteristics that job size is influenced by 
resources. The paper defines DMGS into a single workflow 
model, using colored-hierarchy Petri Nets (CHPN) variable 
structure algorithm. The algorithm is divided into four layers: 
job scheduling, sub-job scheduling, task scheduling and the 
sub-task scheduling. From top to bottom with each level is 
based on CHPN, the scheduling complexity is alleviated. 
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Experimental results show that the algorithm can effectively 
solve the DMGS problem with acceptable effectiveness and 
efficiency. 

In this paper, CHPN based on the actual production process 
is used to solve the DMGS problems. There are some 
limitations which should be improved in the future work. First 
of all, this paper does not take into account uncertainties such 
as the task deadlock, the process of disturbances etc. Therefore, 
this paper can be extended in adding the now assumptions. 
Secondly, all the scheduling processes are carried out in 
DMGS, which does not consider the state changes in the 
production processes. Finally, the actual real-time information 
feedback relationship is not considered as well. Thus, the 
qualitative and quantitative analysis will be carried out in the 
future work. 
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